Search CORE

561 research outputs found

High Performance Computing of Gene Regulatory Networks using a Message-Passing Model

Author: Glass Kimberly
Kepner Jeremy
Quackenbush John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/07/2015
Field of study

Gene regulatory network reconstruction is a fundamental problem in computational biology. We recently developed an algorithm, called PANDA (Passing Attributes Between Networks for Data Assimilation), that integrates multiple sources of 'omics data and estimates regulatory network models. This approach was initially implemented in the C++ programming language and has since been applied to a number of biological systems. In our current research we are beginning to expand the algorithm to incorporate larger and most diverse data-sets, to reconstruct networks that contain increasing numbers of elements, and to build not only single network models, but sets of networks. In order to accomplish these "Big Data" applications, it has become critical that we increase the computational efficiency of the PANDA implementation. In this paper we show how to recast PANDA's similarity equations as matrix operations. This allows us to implement a highly readable version of the algorithm using the MATLAB/Octave programming language. We find that the resulting M-code much shorter (103 compared to 1128 lines) and more easily modifiable for potential future applications. The new implementation also runs significantly faster, with increasing efficiency as the network models increase in size. Tests comparing the C-code and M-code versions of PANDA demonstrate that this speed-up is on the order of 20-80 times faster for networks of similar dimensions to those we find in current biological applications

arXiv.org e-Print Archive

Crossref

Data reporting standards: making the things we use better

Author: Quackenbush John
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Genomic data often persist far beyond the initial study in which they were generated. But the true value of the data is tied to their being both used and useful, and the usefulness of the data relies intimately on how well annotated they are. While standards such as MIAME have been in existence for nearly a decade, we cannot think that the problem is solved or that we can ignore the need to develop better, more effective methods for capturing the essence of the meta-data that is ultimately required to guarantee utility of the data

Crossref

PubMed Central

Weighing our measures of gene expression

Author: Quackenbush John
Publication venue
Publication date: 14/11/2006
Field of study

Crossref

PubMed Central

Cascade Size Distributions: Why They Matter and How to Compute Them Efficiently

Author: Burkholz Rebekka
Quackenbush John
Publication venue
Publication date: 16/12/2020
Field of study

Cascade models are central to understanding, predicting, and controlling epidemic spreading and information propagation. Related optimization, including influence maximization, model parameter inference, or the development of vaccination strategies, relies heavily on sampling from a model. This is either inefficient or inaccurate. As alternative, we present an efficient message passing algorithm that computes the probability distribution of the cascade size for the Independent Cascade Model on weighted directed networks and generalizations. Our approach is exact on trees but can be applied to any network topology. It approximates locally tree-like networks well, scales to large networks, and can lead to surprisingly good performance on more dense networks, as we also exemplify on real world data.Comment: Accepted at AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Estimating sample-specific regulatory networks

Author: Glass Kimberly
Kuijjer Marieke Lydia
Quackenbush John
Tung Matthew
Yuan GuoCheng
Publication venue
Publication date: 28/06/2018
Field of study

Biological systems are driven by intricate interactions among the complex array of molecules that comprise the cell. Many methods have been developed to reconstruct network models of those interactions. These methods often draw on large numbers of samples with measured gene expression profiles to infer connections between genes (or gene products). The result is an aggregate network model representing a single estimate for the likelihood of each interaction, or "edge," in the network. While informative, aggregate models fail to capture the heterogeneity that is represented in any population. Here we propose a method to reverse engineer sample-specific networks from aggregate network models. We demonstrate the accuracy and applicability of our approach in several data sets, including simulated data, microarray expression data from synchronized yeast cells, and RNA-seq data collected from human lymphoblastoid cell lines. We show that these sample-specific networks can be used to study changes in network topology across time and to characterize shifts in gene regulation that may not be apparent in expression data. We believe the ability to generate sample-specific networks will greatly facilitate the application of network methods to the increasingly large, complex, and heterogeneous multi-omic data sets that are currently being generated, and ultimately support the emerging field of precision network medicine

arXiv.org e-Print Archive

Directory of Open Access Journals

NORA - Norwegian Open Research Archives

A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies

Author: John Quackenbush
Patrick D. Schloss
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

As the scope of microbial surveys expands with the parallel growth in sequencing capacity, a significant bottleneck in data analysis is the ability to generate a biologically meaningful multiple sequence alignment. The most commonly used aligners have varying alignment quality and speed, tend to depend on a specific reference alignment, or lack a complete description of the underlying algorithm. The purpose of this study was to create and validate an aligner with the goal of quickly generating a high quality alignment and having the flexibility to use any reference alignment. Using the simple nearest alignment space termination algorithm, the resulting aligner operates in linear time, requires a small memory footprint, and generates a high quality alignment. In addition, the alignments generated for variable regions were of as high a quality as the alignment of full-length sequences. As implemented, the method was able to align 18 full-length 16S rRNA gene sequences and 58 V2 region sequences per second to the 50,000-column SILVA reference alignment. Most importantly, the resulting alignments were of a quality equal to SILVA-generated alignments. The aligner described in this study will enable scientists to rapidly generate robust multiple sequences alignments that are implicitly based upon the predicted secondary structure of the 16S rRNA molecule. Furthermore, because the implementation is not connected to a specific database it is easy to generalize the method to reference alignments for any DNA sequence

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Accelerating genomic data publishing and sharing

Author: Fan Jian-Bing
Quackenbush John
Wacek Bart
Publication venue: The Authors. Published by Elsevier Inc.
Publication date: 31/12/2013
Field of study

Elsevier - Publisher Connector

Recommended from our members

Inferring steady state single-cell gene expression distributions from analysis of mesoscopic samples

Author: Mar Jessica C
Quackenbush John
Rubio Renee
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: A great deal of interest has been generated by systems biology approaches that attempt to develop quantitative, predictive models of cellular processes. However, the starting point for all cellular gene expression, the transcription of RNA, has not been described and measured in a population of living cells. RESULTS: Here we present a simple model for transcript levels based on Poisson statistics and provide supporting experimental evidence for genes known to be expressed at high, moderate, and low levels. CONCLUSION: Although the model describes a microscopic process occurring at the level of an individual cell, the supporting data we provide uses a small number of cells where the echoes of the underlying stochastic processes can be seen. Not only do these data confirm our model, but this general strategy opens up a potential new approach, Mesoscopic Biology, that can be used to assess the natural variability of processes occurring at the cellular level in biological systems

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

University of Queensland eSpace